-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Topi, x86] Using MKL blas for quantized dense #6115
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the MXNet+MKLDNN
baseline also in int8?
@eric-haibin-lin Yes, the MXNet+MKLDNN baseline is also in |
Better to show the performance of TVM before using MKL s8u8s32 GEMM. |
@TaoLv Good point, I added the latency numbers for TVM alone. Thanks for pointing it out! |
@icemelon9 Can you please manage this PR? |
Ping @icemelon9 |
While it is OK to make use of the mkldnn in this case, we should always work hard to get good integer schedules and learn from the insights, just as the case we did for the CUDA softmax and other cases. |
@tqchen Agreed. For now, my reasoning was to just extend MKL to int8. But I agree that it will be better to focus on TVM schedules. This one will require more work as even FP32 schedules for dense are not well optimized. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* [Topi, x86] Using MKL blas for quantized dense * Typo * CBLAS_OFFSET only available for MKL * Skipping tests as GPU CI uses Openblas * Retrigger Co-authored-by: Ubuntu <ubuntu@ip-172-31-0-202.us-west-2.compute.internal>
* [Topi, x86] Using MKL blas for quantized dense * Typo * CBLAS_OFFSET only available for MKL * Skipping tests as GPU CI uses Openblas * Retrigger Co-authored-by: Ubuntu <ubuntu@ip-172-31-0-202.us-west-2.compute.internal>
* [Topi, x86] Using MKL blas for quantized dense * Typo * CBLAS_OFFSET only available for MKL * Skipping tests as GPU CI uses Openblas * Retrigger Co-authored-by: Ubuntu <ubuntu@ip-172-31-0-202.us-west-2.compute.internal>
* [Topi, x86] Using MKL blas for quantized dense * Typo * CBLAS_OFFSET only available for MKL * Skipping tests as GPU CI uses Openblas * Retrigger Co-authored-by: Ubuntu <ubuntu@ip-172-31-0-202.us-west-2.compute.internal>
* [Topi, x86] Using MKL blas for quantized dense * Typo * CBLAS_OFFSET only available for MKL * Skipping tests as GPU CI uses Openblas * Retrigger Co-authored-by: Ubuntu <ubuntu@ip-172-31-0-202.us-west-2.compute.internal>
Using MKL for quantized dense, following the MKL fallback for FP32 dense.
On C5.12x large cascade lake with VNNI support, results for BERT base are as follows (latency in ms)
@icemelon9 @eric-haibin-lin @shoubhik
TVM Alone has bad performance because we don't have a good integer dense schedule.